Making Jupyter notebooks Google Colab ready

Posted on Fri 07 June 2019 in python

Jupyter notebooks are great for sharing code, but the step between showing people the code you produced, and allowing them to run that code can be difficult. Notebooks that you're running within your own environment usually relies on external code that makes it difficult to just copy a notebook and run it. In an ideal world, we want people to be able to run our notebooks as easily as look at them.

Google Colab is a great bridge to allow you to quickly run someone else's notebooks, but Colab still requires that you have the right code dependencies available. Here is a quick fix that I have been using to run Colab Notebooks that require external dependencies easily. An example can be seen in my Tensorflow 2 Generative Models repository.

There are basically two steps to making your Jupyter notebooks Colab-ready:

  1. Install the requisite dependencies
  2. Add a link to open the notebook in Colab

Installing dependencies

The easiest way to install packages in Colab is simply to run a command line script e.g. !pip install numpy. However, you don't want to run that command when you're not in Colab. So instead, you need to test whether or not you're in Colab. To do this, all you need to do is test if google.colab is an available module: "google.colab" in sys.modules. Since google.colab is a colab specific module, if it's available, you are in Colab.

So to install the requisite dependencies, lets first make a list of the dependencies we're going to need to install:

In [ ]:
colab_requirements = [
    "pip install tf-nightly-gpu-2.0-preview==2.0.0.dev20190513",
    "pip install tfp-nightly==0.7.0.dev20190508",
]

Then we test if we are in Colab, and if so, pip install the requirements (using subprocess):

In [ ]:
import sys, subprocess

def run_subprocess_command(cmd):
    # run the command
    process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
    # print the output
    for line in process.stdout:
        print(line.decode().strip())
        
if IN_COLAB:
    for i in colab_requirements:
        run_subprocess_command(i)

When code isn't on pip, you can still install it, just pip install from the GitHub repo. To make a GitHub repo, all you need to do is create a setup.py

This can be really useful for submitting issues on github. For example, if you're reporting a Minimal Reproducable example on a specific nightly version of a GitHub repo, you can just pip install the that commit in colab, and reproduce your error. For example:

pip install git+git://github.com/YOURUSERNAME/YOURREPOSITORY.git@YOUR_COMMIT_HASH

Adding a link to open the colab notebook as also really easy, and can even be created as a badge.

You just need to link to

https://colab.research.google.com/github/MYUSERNAME/MYREPOSITORY/blob/MYBRANCH/PATH/TO/MYNOTEBOOK.ipynb

For example, here is the code to create this link to a Variational Autoencoder Colab notebook:

Open In Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/timsainb/tensorflow2-generative-models/blob/master/1.0-Variational-Autoencoder-fashion-mnist.ipynb)
In [ ]:
 
Star

Fork